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Abstract. We present a symbolic-execution-based algorithm that for a 
given program and a given program location produces a nontrivial nec- 
essary condition on input values to drive the program execution to the 
given location. We also propose an application of necessary conditions 
in contemporary bug-finding and test-generation tools. Experimental re- 
sults show that the presented technique can significantly improve perfor- 
mance of the tools. 



1 Introduction 

Symbolic execution [18, 17] produces, for each path in a program starting in the 
initial location, a formula called path condition, i.e. the necessary and sufficient 
condition on input data to drive the execution along the path. Symbolic execu- 
tion serves as a basis in many successful tools for test generation and bug finding, 
for example Klee [4], Exe [5], Pex [27], Sage [11], or Cute [25]. These tools 
can relatively quickly find tests that cover a significant part of a given code. 
However, the ratio of covered code further increases very slowly or not at all. 
Here we present a method that helps the tools to cover a chosen program location 
and hence to further improve their performance. 

The core of our method and the main contribution of the paper is a symbolic- 
execution-based algorithm that, for a given program and a given program lo- 
cation, produces a nontrivial necessary condition on input values to drive the 
program execution to the given location. The basic idea of this algorithm is to 
replace each program loop by a summary of its effect on both, program vari- 
ables and path conditions. The algorithm is intuitively explained in Section 2 
and precisely described in Section 3. 

The algorithm usually produces necessary conditions with quantifiers. In 
spite of recent advances in SMT solving, current SMT solvers often fail to quickly 
decide satisfiability of quantified formulae. We employ a specific structure of 
necessary conditions and introduce a transformation of a quantified necessary 
condition into a more general but quantifier-free necessary condition. The trans- 
formation is presented in Section 4. 

Section 5 proposes possible applications of necessary conditions in the test- 
generation tools. In principle, the application of necessary conditions can speed 
up recognition of unreachable locations as well as discovery of tests reaching 
uncovered locations. Experimental results provided in Section 6 shows that the 
proposed method can seriously improve performance of the tools. 




Fig. 1. Flowgraph of the running example (left), flowgraph P{{c, d, e, /}, c) induced by 
its loop {c, d, e, /} with entry node c (middle), and flowgraph with more loops (right). 

Finally, Section 7 discusses some related work and Section 8 concludes the 
paper. 

2 Outline of the Algorithm 

An intuitive explanation of the algorithm is illustrated on the following simple 
program, where we want to compute a necessary condition to reach the assertion 
on the last line. 

k = 0; 

for (i = 3; i < n; ++i) { 
if (A[i] == 1) 
++k; 

} 

if (k > 12) 

assert (false) ; 

The relevant part of the program can be represented as the flowgraph of Figure 1 
(left), where the node h corresponds to the target location. 

Our algorithm works on flowgraphs. In the following, by complete path we 
mean a path in the flowgraph leading from the initial to the target location. 
If a given flowgraph contains only finitely many complete paths tti, . . . , 7r„, one 
can compute a necessary (and sufficient) condition very easily: using symbolic 
execution, we compute a path condition ifi for each complete path TTj and we 
construct the necessary condition as 

'f>= V 

l<i<n 
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Unfortunately, the number of complete paths is usually infinite as flowgraphs 
typically contain cycles. However, the number of acyclic complete paths is al- 
ways finite. Therefore, we associate to each complete path tt one acyclic complete 
path called the backbone of tt. The backbone is defined by the following proce- 
dure: If TT is acyclic, then the backbone is directly tt. Otherwise, wo find the 
leftmost repeating node in tt, remove the part of tt between the first and the last 
occurrence of this node (including the last occurrence), and repeat the proce- 
dure. In other words, the backbone arises from tt by removing all cycles. Note 
that the cycles correspond to iterations of program loops. 

The algorithm finds all backbones in a given fiowgraph. For each backbone 
p, it computes an abstract path condition. This path condition is called abstract 
as it represents not only the backbone, but all complete paths with backbone 
p. More precisely, the abstract path condition is implied by each path condition 
corresponding to a complete path with backbone p. The resulting necessary 
formula is a disjunction of all abstract path conditions. 

The crucial step in the computation of abstract path condition for a backbone 
is to identify all loop entry nodes lying on this backbone and compute summaries 
of the corresponding loops. We demonstrate the computation of a loop summary 
on our running example. The fiowgraph of Figure 1 (left) has only one backbone 
abcgh. The backbone contains one loop entry node c entering the loop {c, d, e, /}. 
The computation of a simimary is based on an analysis of paths going around 
the loop from the entry node back to it. To simplify the analysis, we first extract 
the loop from the original fiowgraph and then we analyse its complete paths. At 
Figure 1 (middle), there is a fiowgraph representing the loop {c, d, e, /} of our 
example. This fiowgraph contains two complete paths, namely tti = cdefc' and 
7r2 — cdfc', we need to analyse. 

The first part of a loop summary is a description of the overall effect of the 
loop on variable values, since the first visit of the loop entry node to the last 
visit of the node. The effect is described by an iterated symbolic state, which is a 
function that assigns to each program variable its value given by an expression 
over symbols and path counters. Symbols represent initial values of variables at 
the first visit of the entry node (for each variable a, we denote its symbol by 
a). Path counters ki, «;2, • • • correspond to different backbones of the fiowgraph 
representing the extracted loop. Each path counter represents the number of 
loop iterations along the corresponding backbone. 

In our example, we assign path counters ki,K2 to backbones 7ri,7r2 respec- 
tively. The overall effect of the loop with respect to the entry node c can be 
described by the iterated symbolic state 9'^ with only two interesting values (as 
the other variables are not changed in the loop): 

e'^{i) = Ki+ K2+i e'^{iL) = K-L + k 

In other words, by k\ iterations of tti and K2 iterations of 772 executed in an 
arbitrary order, the values of i and k are increased by K1+K2 and ki, respectively. 

The second part of a loop summary is a looping condition tp'^. Given path 
counters Ki, . . . , k„ corresponding to acyclic paths tti, . . . , 7r„ around the loop. 
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the formula ip'^ describes a necessary condition to perform Ylii<i<n '^i iterations 
of the loop such that, for each i, exactly Ki iterations use path tt^. In other 
words, (/j" has to be implied by each path condition ip corresponding to such 

Ki iterations of the loop. Note that, for each test on a path ttj, we know that 
this test appears exactly K^-times in ip. Of course, all the variables in this test 
can be evaluated to difFcrcnt symbolic expressions each time. We construct the 
looping condition (ys"' as A . . . A f/'n ■ Each subformula tpi says that, for each 
of the Ki iterations along the path ttj, all tests on the path must be satisfied 
for some possible values of variables, i.e. for some values given by the iterated 
symbolic state and some admissible values of path counters. 

In the example, the looping condition ip'^ for the loop {c, d, e, /} with the 
entry node c has the form ipi A ip2- We focus on the construction of ijji which 
corresponds to path m containing two tests: i < n and A [i] == 1. The iterated 
symbolic state says that values of i, n, and A[i] in (ti + l)-st iteration of tti 
and after T2 iterations of 7r2 are T1+T2 + i, n, and A{ti +T2 + i) respectively. To 
complete this iteration of tti , the conditions Ti+T2 + i< n and A{ti +T2+£) = 1 
have to be satisfied. In general, if we want to make ki iterations of tti and K2 
iterations of 7r2, the formula i^i says that for each n satisfying < n < Ki 
there has to be some T2 satisfying < r2 < K2 such that ti + T2 + i < n and 
A{ti+T2 +1) = 1. The complete looping condition for our example is as follows: 

ip'^ = ijji A 1p2 

ipi = Vri (0 < Ti < Ki — )• 3r2(0 < T2 < K2 A ti + T2 + i < n A 

A A{ti +T2+i) = 1)) 

1P2 = Vr2 (0 < T2 < k;2 ^ 3ri (0 < Ti < Ki A ri + T2 + 1 < tt. A 

A A(ti +T2+£)^ 1)) 

Finally, we assign loop summaries to the corresponding loop entry nodes on 
a backbone and we apply (a slight modification of) symbolic execution to get an 
abstract path condition for the backbone. 

In our running example, the symbolic execution of the backbone abcgh pro- 
ceeds as follows. The first two edges of the backbone are executed in the standard 
way. We get a symbolic state 61 with only two interesting values (the other vari- 
ables are not changed) 

ei{±)=3 ^i(k)=0 

and a path condition 71 = true. As we are now in the loop entry node c, we have 
to process the loop summary The composition of 61 and 0'^ results in 

a symbolic state 62 representing variable values after the loop: 

6l2(i) = Kl -hK2 + 3 6l2(k) = Ki 

We also extend the path condition 71 with the looping condition Lp'^ , where 
the symbols i,k are replaced by 0i(i),0i(k) respectively. Hence, we get path 
condition 72 = <^''[i/3, fc/0]. At the end, we process the last two edges of the 
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backbone. The edges do not change variable values, but they extend the path 
condition 72 with tests i >= n and k > 12 evaluated in abstract state 62- Hence, 
we get path condition 

73 = v'^[i/3, fc/0] A Ki + K2 + 3 > n A Ki > 12. 

To obtain the resulting abstract path condition for the backbone, we exis- 

tcntially quantify all path counters in the formula 73 and we state that values of 
the path counters have to be non-negative. Hence, the abstract path condition 
for the backbone of our example is 

= 3ki, K2 ( Ki, K2 > A ¥''^[i/3, fc/0] A Ki + K2 + 3 > n A Ki > 12). 

The necessary condition (f is then a disjunction of all abstract path conditions 
for backbones. As there is only one backbone abcgh in our running example, we 

directly get ip — ip. 

To sum up, our technique produces a necessary formula (f that has to be 
satisfied by all inputs driving the execution to the target location. In general, 
the formula is not a sufficient condition on inputs to reach the target node for 

two reasons. 

— It is not always possible to express the overall effect of a loop on a variable in 
the presented way. In such a case, the variable is assigned the special value 
★ with the meaning "unknown" . If we symbolically execute a test containing 
a variable with the value -k, we do not add this test to our path condition. 

— The looping condition is constructed as a necessary but not a sufficient con- 
dition. More precisely, it checks whether tests in each iteration are satisfied 
for the iterated symbolic state with some admissible values of path coun- 
ters, but the consistency of these admissible values over all iterations is not 
checked. 

3 Precise Description of the Algorithm 

For simplicity and due to space limitations, we restrict our attention to programs 

manipulating only integer variables and read-only multidimensional integer ar- 
rays, and with no function calls. The advanced version of the algorithm, which 
is presented in [26] , works with programs that can modify arrays. Moreover, the 
algorithm can be extended to handle variables of other types, function calls, etc. 

3.1 Preliminaries 

A flowgraph is a tuple P = {V, E, ls,lt, i), where {V, E) is a finite oriented graph, 
lg,lt G V are different start and target nodes respectively, and t : E ^ I is & 
function assigning to each edge e an instruction i(e). A node is branching if its 
out-degree is 2. All other nodes have out-degree at most 1. We use two kinds 
of instruction: an assignment instruction a:=e for some scalar variable a and 
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some expression e, and an assumption assume (7) for some quantifier- free formula 
7 over program variables. Out-edges of any branching node are labeled with 
assumptions assume(7) and assume(->7) for some 7. We often omit the keyword 
assume in our examples. We assume that instructions only operate on scalar 
variables a, b, . . . of type Int and multi-dimensional array variables A, B, . . . of 
type Int*^ — ^Int. Note that often we identify programs with the corresponding 
flowgraphs. 

A path in a flowgraph is a finite sequence tt = V1V2 ■ • ■ of nodes such that 
/c > and {vi,Vij^i) e for all 1 < « < k. Paths are always denoted by Greek 
letters. The terms complete path and its backbone have been already defined in 
the previous section. 

Now we formalize definitions of loops and loop entry nodes on a backbone. 
Let TT be a backbone with a prefix av. There is a loop C with an entry node v 
along TT if there exists a path vjiv such that no node of (3 appears in a. The loop 
C is then the smallest set containing all nodes of all paths satisfying that no 
node of (3 appears in a. For example, the flowgraph of Figure 1 (right) contains 
two backbones: tti = Isbdlt and 1^2 = habdlt- While tti contains only one loop 
{a, b, c, d} with entry node b, 1^2 contains loop {a, b, c, d} with entry node a and 
loop {5, c} with entry node b. 

For a loop C with an entry node a flowgraph induced by the loop, denoted 
as P{C,v), is the subgraph of the original flowgraph induced by C where v is 
marked as the start node, a fresh node v' is added and marked as the target 
node, and every edge € E leading to v is replaced by an edge (u,v'). An 

example of an induced flowgraph is provided in Figure 1 (middle) . 

By symbolic expressions we mean all expressions build with integers, standard 
integer operations and functions, and 

— a (constant) symbol a for each scalar variable a, 

— a function symbol A for each array variable A, where arity of A corresponds 
to the dimension of array A, 

— a countable set {ki,ti, K2,T2, . . .} of variables called path counters, and 

— a special constant symbol ★ called unknown. 

Let /, ei, . . . , e„ be symbolic expressions and xi,. . . ,Xn be path counters or 
constant symbols corresponding to scalar variables. Then f[xi/ei, . . . ,x„/e„] is 
a symbolic expression / where all occurrences of Xi are replaced by Cj, simulta- 
neously for all i. To shorten the notation, we write f[x/e\ when the meaning is 
clearly given by a context. We also use the notation ip[x/e\ with the analogous 
meaning. 

A symbolic state is a function 9 assigning to each scalar variable a a symbolic 
expression and to each array variable A the function symbol A (recall that our 
programs do not change values of arrays). Let a be a scalar variable and e be 
a symbolic expression. Then ^[a ^ e] is a symbolic state equal to 6 except 
for variable a, where 9[a c\{a) = e. The notation 9{-) is used in a more 
general way. It always denotes the operation of replacing each (scalar or array) 
variable a by ^(a). Similarly, ${■) denotes the operation on symbolic expressions 
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or formulae, where each symbol a is replaced by 6* (a). Additionally, 9i {62) denotes 
composition of two symbolic states 9i and 62 satisfying ^i(^2)(a) = ^i(^2(a)) for 
each variable a. Intuitively, the symbolic state ^1(^2) represents an overall effect 
of symbolic execution of a code with effect 61 followed by a code with effect 62- 
Finally, for vectors il, = . . . , m„) and v = {vi,. . ., Vn) we use u < w as an 
abbreviation for Ui < Vi A . . . AUn < Vn. 

3.2 The Algorithm 

Now we precisely formulate our algorithm computing a necessary condition (p 
for reaching a target node of a given flowgraph. In contrast to the intuitive 
description given in Section 2, we describe more details including computation 
of loop summaries and dealing with nested loops. 

To compute the necessary condition, we call Algorithm 1 on the set of all 
backbones {tti, . . . , tt^} of the given program. The algorithm performs a mod- 
ified symbolic execution of these backbones described later. We extract path 
conditions ipi,. . . ,(fik from the algorithm output and we compute the necessary 
condition as 



where Kj is a vector of all path counters having a free occurrence in ip^. 

The Algorithm 1 symbolically executes backbones. For each backbone tt^, we 
first analyse all loops along it (see foreach loop at lines 3-9). Since we convert 
loops into induced flowgraphs, we can analyse loops (and their nested loops) in 
the same way as we analyse the original program (see line 7). The main part of 
the loop analysis is the computation of loop summaries (sec line 8) performed by 
Algorithm 2 and discussed later. After analysis of loops along m, the backbone is 
symbolically executed (see foreach loop at lines 13-21). The symbolic execution 
differs from the original one only at loop entry locations, where wc process the 
computed loop summaries. For each backbone tTj, the algorithm computes a 
symbolic state 6i and a path condition <^j. 

It remains to discuss the computation of loop summaries. The procedure is 
depicted in Algorithm 2. The algorithm introduces new path counters k for all 
backbones within the loop. Note that the algorithm knows the effect of each 
backbone within the loop as it gets the corresponding symbolic states and path 
conditions as input. The only task is to combine these symbolic states into an 
iterated symbolic state 6'^ and to assemble a looping condition ip'^. 

The first half of the algorithm (sec lines 2-10) computes the iterated state 6*"^. 
To be on the safe side, we start with 0" assigning ★ to all scalar variables. Then 
we gradually improve the precision of 6'^. The crucial step is the computation of 
an improved value e for a scalar variable a (see line 6). Then value e is defined 
as ★ except for the following cases: 

1. For each backbone tt-, we have 0^(a) = a. In other words, the value of a is 
not changed in any iteration of the loop. This case is trivial. We set e = a. 




l<i<k 
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Algorithm 1: executeBackbones({7ri, . . . 



Input: 

{tti, . . . jTTfc} // backbone paths to be executed 
Output: 

{(tti, ^1, (^i), . . . , (TTfc, ^fc, v^fc)} // result of symbolic execution of backbones 

1 result i — 

2 foreach i = 1, . . . , fc do 

3 foreach loop entry v along TTi do 

4 Let C be the loop at v 

5 Compute induced flowgraph P{C, v) for the loop C at v 

6 Let {tt'i, . . . , n'l} be all backbones in P{C, v) 

7 {{n'i,e'i,ip'i),...,{Tr'i,0'i,(p'i)} i — executeBackboiies({7rJ, . . . ,7ri'}) 

8 < — computeSunimary({(7ri,6'i,<^i),...,(7r,',6l;',¥';)}) 

9 Attach the sunmiary {0" .Lp''" ) to the loop entry v 

10 Initialize di to return a for each variable a 

11 ipi i — true 

12 Let TTi — Vl . . . Vn 

13 foreach j = 1, . . . ,n — 1 do 

14 if Vj is a loop entry then 

15 Let {9'^, (fi'^) be the summary attached to Vj 

16 ifii < (fii A di{ifi'^) 

17 Oi ^ Oiie'^) 

18 if L{(vj,Vj+i)) has the form assiajne(7) and Oi{'y) contains no ★ then 

19 (fii < — 'fii,A6i{'y) 

20 if t{{vj, Vj+i)) has the form a < — e then 

21 di ^dilci ^ diie)] 

22 Insert triple {ni,6i,ipi) into result 

23 return result 



2. For each backbone tt-, either 0^(a) = a or O^a.) = a + for some symbolic 
expression di such that 0"'{di) contains neither ★ nor any path counters. Let 
us assume that the latter possibility holds for backbones ttJ^ , . . . , tt'^^ . The 
condition on 9'^{di) guarantees that the value of di is constant during all 
iterations over the loop. In this case, we set e = a + X^i<i<TO di ■ Hi. 

Note that one can easily add another cases covering other situations where the 
value of a can be expressed precisely, e.g. the case capturing geometric progres- 
sion. Description of all the cases we have implemented is rather technical and 
has no impact to understanding of the algorithm. Some other cases can be found 
in [26]. 

The second half of the algorithm (see lines 11-17) builds looping condition ^p'^. 
The pseudo-code follows the description given in Section 2. The only extension 
is handling of path counters k,^ related to inner loops along a backbone tt-. Since 
an inner loop can be iterated a different number of times in each iteration of tt-, 
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Algorithm 2: computeSummary ({(tt^, 9[,ip[), . . . , (tT;, 9[, f'l)}) 
Input: 

{{7v'i,0'i,ip'i), . . . , (tt;', <pj)} // results from single execution of backbones 
Output: 

{9'^,ip"') // the computed summary 

1 Introduce fresh path counters k = {m, . . . ,ki) for ttJ, . . . ,7r[ respectively 

2 Initialize to return * for each scalar variable 

3 repeat 

4 change < — false 

5 foreach scalar variable a do 

6 Compute an improved value e for the variable a from 6'^, . . . ,0'i and 0'^ 

7 if e 7^ ★ A 6l'^(a) = * then 

8 r-i— r[a->e] 

9 change < — true 

10 until change = false 

11 foreach i = 1, . . . ,1 do 

12 Let K,'i be a vector of all path counters having a free occurrence in ip^ 

13 Yi ^ {e'^{'fi'i))[K/f\, where f = (n, . . . ,T() 

14 Remove all predicates of 7,' containing ★ 

15 Let Ki = (ki, . . . , Ki+i, . . . K;) and = (n, . . . ,ri_i, ri+i, . . .Ti) 

16 V< < VTi (0 < Ti < Kj 3Ti (0 < Ti < Ki A 3Ki{Ki > A 7-))) 

17 v''^ < — tp'i A . . . A ijj'i 

18 return {9'^,ip'^) 



we have to existentially quantify the corresponding path counters in the looping 
condition ^p'^. 

In a real code, the number of iteration of an inner loop is often the same in 
each iteration of the outer loop. Or there may be some simple relation between 
the number of iterations of an inner loop and the current value of the path 
counter related to the outer loop. Capturing this relation can greatly improve 
the precision of a loop summary. Therefore, we have developed an 'heavyweight' 
version of Algorithm 2, where we try to find a linear relations between inner and 
outer path counters using an SMT solver. For details see [26]. 

3.3 Soundness cind Incompleteness 

We finish this section by formulating soundness and incompleteness theorems 

for our algorithm. 

Theorem 1 (Soundness) Let (p he the necessary condition computed by our 
algorithm for a given program and a given target location. If (p is not satisfiable, 
then the target location is not reachable in that program. 

An intuitive proof can be found in [26]. 
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Theorem 2 (Incompleteness) There is a program and an unreachable target 
location in it for which the formula (p computed by our algorithm is satisfiable. 

Proof. Let us consider the following C code: 

int i = 1; while (i < 3) { if (i == 2) i = 1 ; else i = 2; } 

The loop never terminates. Therefore, a program location below it is not reach- 
able. But (fi computed for that location is equal to true, since variable i does 
not follow a monotone progression. □ 

4 Dealing with Quantifiers 

We can ask an SMT solver whether a computed necessary condition is sat- 
isfiable or not. If it is, we may further ask for a model. As we will see in Sec- 
tion 5, such queries to a solver should be fast. Unfortunately, our experience with 
solvers shows that presence of quantifiers in (p usually causes performance issues. 
To overcome this issue, we introduce a transformation of ip into a quantifier- free 
formula (p^ that is implied by ip and thus remains necessary. The transformation 
is parametrized hy K >0. 

One can immediately see that all universal quantifiers in (p come from for- 
mulae ipl of line 16 of Algorithm 2. Each formula V'i has the form 

^- =Vri {0<Ti<Ki pin)). 

Clearly, the formula is equivalent to Ao<r <k Pi'^i)- not know the value 

of Ki, but we can weaken the formula to check only the first K instances of p{Ti). 
In other words, we replace each ^'^ in by a weaker formula 

i^i^ = A {ri<Ki ^ p{Ti)). 

0<Ti<K 

Having eliminated all universal quantifiers, we can also eliminate existential 
quantification of all Ki and all fi by redefining them as uninterpreted integer 
constants. The resulting formula is denoted as (p>^ . 

Let us note that the choice of K affects the length and precision of (p^: the 
higher value of K we choose, the stronger and longer formula ip^ we get. 

In some cases, an SMT solver decides satisfiability of very quickly: even 
in a shorter time than needed for transformation of (p into (p^ . In practice, we 

ask the solver for satisfiability of <p and, in parallel, we transform p into ip^ and 
then ask the solver for satisfiability of (p^ . We take the faster answer. 

5 Integration into Tools 

Tools typically explore program paths iteratively. At each iteration there is a 
set of program locations {wi, . . .,Vk}, from which the symbolic execution may 
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continue further. At the beginning, the set contains only program entry location. 
In each iteration of the symbolic execution the set is updated such that actions 
of program edges going out from some locations Vi are symbolically executed. 
Different tools use different systematic and heuristic strategies for selecting lo- 
cations Vi to be processed in the current iteration. It is also important to note 
that for each Vi there is available an actual path condition capturing already 
taken symbolic execution from the entry location up to Vi. 

When a tool detects difficulties to cover a particular program location, then 
using If it can restrict selection from the whole set {vi, . . . ,Vk} to only those 
locations Vi, for which a formula ipi A is satisfiable. In other words, if for some 
Vi the formula (pi A (fi it, not satisfiable, then we are guaranteed there is no real 
path from Vi to the target location. And therefore, Vi can safely be removed from 
the consideration. 

Tools like Sage, Pex or Cute combine symbolic execution with concrete 
one. Let us assrune that a location Vi, for which the formula Ai^ is satisfiable, 
was selected in a current iteration. These tools require a concrete input to the 
program to proceed further from Uj. Such an input can directly be extracted 
from any model of the formula (pi A(p. 

6 Experimental Results 

We have implemented the algorithm (employing the mentioned heavyweight ver- 
sion of Algorithm 2, extended to handle modifiable arrays, and slightly opti- 
mized) in an experimental tool called Apc. We also prepared a small set of 
benchmark programs mostly taken from other papers. In each benchmark we 
marked a selected location as the target one. All the benchmarks have a huge 
number of paths, so it is difficult to reach the target. We run Pex and Apc on 
the benchmarks and we measured times till the target locations were reached. 
This measurement is obviously unfair from Pex perspective, since its task is 
to cover an analysed benchmark by tests and not to reach a single particular 
location in it. Therefore, we clarify the right meaning of the measurement now. 

Our only goal here is to show that Pex could benefit from our algorithm. 
Typical scenario when running Pex on a benchmark is that all the code except 
the target location is covered in few seconds (typically up to three). Then Pex 
keeps searching space of program paths for a longer time without covering the 
target location. In contrast, Apc only builds a necessary condition and asks 
SMT solver for its satisfiability. If wc want to show that Pex could benefit 
from the algorithm, then Apc must be significantly faster then Pex in more 
benchmarks. 

Before we present the results, wc discuss the benchmarks. Benchmark HWM 
taken from [1] checks whether an input string contains four substrings: Hello, 
world, at and Microsoft ! . It does not matter at which position and in which 
order the words occur in the string. The target location can be reached only 
when all the words are presented in the string. The benchmark consists of four 
loops in a sequence, where each loop searches for one of the four subwords. Each 
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Table 1. Running times of Pex and Apc on benchmarks. 



loop traverses the input string from the beginning and at each position it runs 
a nested loop checking whether the subword starts at this position. Benchmark 
HWM is the most complicated one from our set of benchmarks. We also took 
its two lightened versions presented in [20] ; Benchmark HW searching the input 
string only for subwords Hello and world while benchmark Hello searches only 
for the first one. 

Benchmark MatrlR scans upper triangle of an input matrix. The target lo- 
cation is reached if the matrix is bigger than 20 x 20 and it contains a line with 
more than 15 scanned elements between 10 and 100. 

Benchmarks OneLoop and TwoLoops originate from [20] . They are designed 
such that their target locations are not reachable. Both benchmarks contain 
a loop iterated n-times. In each iteration, the variable i (initially set to 0) is 
increased by 4. The target location is then guarded by an assertion i==15 in 
OneLoop and by a loop while (i != j + 7) j += 2 in TwoLoops (j is ini- 
tialized to before the loop). 

The last benchmark WinDriver comes from a practice and we took it from [12]. 
It is a part of a Windows driver processing a stream of network packets. It reads 
an input stream and decomposes it into a two dimensional array of packets. A 
position in the array where the data from the stream are copied into are encoded 
in the input stream itself. We marked the target location as a failure branch of 
a consistency check of the filled in array. It was discussed in the paper [12] that 
the consistency check can be broken. For complete benchmark listings see [26]. 

The experimental results are depicted in Table 1. They show running times in 
seconds of Pex and Apc on the benchmarks. We did all the measurements on a 
single common desktop computer^. The mark T/0 in Pex column indicates that 
it failed to reach the target location within an hour. For APC we provide the total 
running times and also time profiles of different paths of the computation. In 
sub-column 'Bid there are times required to build the necessary condition (p. 

1 Intel® Core™ 17 CPU 920 @ 2x2.67GHz, 6GB RAM, Windows 7 Professional 64- 
bit, MS Pex 0.92.50603.1, MS Moles 1.0.0.0, MS Visual Studio 2008, MS .NET 
Framework v3.5 SPl, MS Z3 SMT solver v3.2, and boost vl.42.0. 
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In sub-column 'Trans + SMT i^^^' there are two times for each benchmark. The 
first number identifies a time spent by transformation of (p into (fP'^ . The second 
number represent a time spent by Z3 SMT solver [29] to decide satisfiability of 
the formula ip^^ . Characters behind these times identify results of the queries: 
S for satisfiable, U for unsatisfiable and X for unknown. The last sub-column 
'SMT contains running times of Z3 SMT solver directly on formulae (p. The 
mark M/0 means that Z3 went out of memory. As we explained in Section 4 the 
construction and satisfiability checking of (p^ runs in parallel with satisfiability 
checking of (p. Therefore, we take the minimum of the times to compute the total 
running time of APC. The faster variant is written in boldface. 

For all benchmarks, the computed necessary conditions ip arc also sufficient. 
Thus, models of formulae produced by Z3 SMT solver directly describe tests 
covering the target locations. 

7 Related Work 

Early work on symbolic execution [18, 17] showed its effectiveness in test genera- 
tion. King [18] further showed that symbolic execution can bring more automa- 
tion into Floyd's inductive proving method [6]. Nevertheless, loops as the source 
of the path explosion problem were not in the center of interest. 

More recent approaches dealt mostly with limitations of SMT solvers and 
the environment problem by combining the symbolic execution with the con- 
crete one [9, 1, 25, 10, 7, 11, 8, 27, 11, 21]. Although practical usability of the sym- 
bolic execution improved, these approaches still suffer from the path explosion 
problem. An interesting idea is to combine the symbolic execution with a com- 
plementary technique [14, 16, 2, 19, 15]. Complementary techniques typically per- 
form differently on different parts of the analysed program. Therefore, an infor- 
mation exchange between the techniques leads to a mutual improvement of their 
performance. There are also techniques based on saving of already observed pro- 
gram behaviour and early terminating those executions, whose further progress 
will not explore a new one [3,5,4]. Compositional approaches arc typically based 
on computation of function summaries [7, 1]. A function summary often consists 
of pre and post condition. Preconditions identify paths through the function 
and postconditions capture effects of the function along those paths. Reusing 
these summaries at call sites typically leads to an interesting performance im- 
provement. Moreover, the summaries may insert additional symbolic values into 
the path condition which causes another improvement. And there arc also tech- 
niques partitioning program paths into separate classes according to similarities 
in program states [22,23]. Values of output variables of a program or function 
are typically considered as a partitioning criteria. A search strategy Fitnex [28] 
implemented in Pex [27] guides a path exploration to a particular target location 
using a fitness function. The function measures how close an already discovered 
feasible path is to the target. 

Although the techniques above showed performance improvements when deal- 
ing with the path explosion problem, they do not focus directly on loops. The 
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LESE [24] approach introduces symbolic variables for the number of times each 
loop was executed. These variables are related to our path counters, but the 
path counters provide finer information as they are associated to iterations via 
individual paths through a loop. LESE links the symbolic variables with features 
of a known grammar gcncirating inputs. Using these links, the grammar can con- 
trol the numbers of loop iterations performed on a generated input. A technique 
presented in [13] analyses loops on-the-fly, i.e. during simultaneous concrete and 
symbolic execution of a program for a concrete input. The loop analysis infers 
inductive variables, i.e. variables that are modified by a constant value in each 
loop iteration. These variables are used to build loop summaries expressed in a 
form of pre a post conditions. The summaries are derived from the partial loop 
invariants synthesized dynamically using pattern matching rules on the loop 
guards and induction variables. The algorithm presented in [20] shares exactly 
the same goal as this paper: to reach a given target location. For each pair of 
acyclic paths around a loop, the technique introduces artificial counter keeping 
information about the number of iterations around one path since the last it- 
eration around the other path. Values of program variables arc expressed using 
these counters. Predicates on paths to the target are used to build constraint 
systems on the counters. Solutions of the systems guide symbolic execution to 
the target. 

8 Conclusion 

We presented an algorithm computing a necessary condition ip that represents 
an over-approximated set containing all real program paths leading to a given 
target program location. We proposed the use of (p in test-generation tools based 
on symbolic execution. Such a tool can cover the target location faster by using (p 
to explore only program paths in the over-approximated set. We also showed that 
<p can be used in the tools very easily and naturally. Finally, our experimental 
results indicate that Pex could benefit from our algorithm. 
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